home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SGI Varsity Update 1998 August
/
SGI Varsity Update 1998 August.iso
/
dist
/
dist6.4
/
mpi.idb
/
usr
/
share
/
catman
/
u_man
/
cat1
/
mpirun.z
/
mpirun
Wrap
Text File
|
1998-07-29
|
26KB
|
493 lines
mpirun(1) Last changed: 5-18-98
NNAAMMEE
mmppiirruunn - Runs MPI programs
SSYYNNOOPPSSIISS
mmppiirruunn [[_g_l_o_b_a_l__o_p_t_i_o_n_s]] _e_n_t_r_y [[:_e_n_t_r_y ...]]
IIMMPPLLEEMMEENNTTAATTIIOONN
UNICOS, UNICOS/mk, and IRIX systems
To launch MPI programs on IRIX and UNICOS systems, Array Services
software must be running.
Running MPI jobs in the background is not supported on IRIX and UNICOS
systems. For more information, see the NOTES section.
SSTTAANNDDAARRDDSS
This release implements the MPI 1.2 standard, as documented by the MPI
Forum in the spring 1997 release of _M_P_I: _A _M_e_s_s_a_g_e _P_a_s_s_i_n_g _I_n_t_e_r_f_a_c_e
_S_t_a_n_d_a_r_d.
The MPI implementation for UNICOS/mk systems is derived from the
implementation of MPI for UNICOS MAX systems developed at Edinburgh
Parallel Computing Centre. The software is supplied to Cray Research
under license from the University of Edinburgh.
DDEESSCCRRIIPPTTIIOONN
The mmppiirruunn command is the primary job launcher for the Message Passing
Toolkit (MPT) implementations of MPI. The mmppiirruunn command must be used
when a user wants to run an MPI application on IRIX or UNICOS systems
(on IRIX systems, XMPI can be used in place of mmppiirruunn). On IRIX or
UNICOS systems, you can run an application on the local host only (the
host from which you issued mmppiirruunn) or distribute it to run on any
number of hosts that you specify.
Note: Use of the mmppiirruunn command is optional for UNICOS/mk systems and
currently supports only the --nnpp option.
Several MPI implementations available today use a job launcher called
mmppiirruunn, and because this command is not part of the MPI standard, each
implementation's mmppiirruunn command differs in both syntax and
functionality.
The mmppiirruunn command accepts the following operands:
The _g_l_o_b_a_l__o_p_t_i_o_n_s operand applies to all MPI executable files on all
specified hosts. Global options must be specified before local options
specific to a host (entry operands). The following global options are
supported:
OOppttiioonn DDeessccrriippttiioonn
--aa[[rrrraayy]] _a_r_r_a_y__n_a_m_e Specifies the array to use when launching an
MPI application. By default, Array Services
uses the default array specified in the Array
Services configuration file,
//uussrr//lliibb//aarrrraayy//aarrrraayydd..ccoonnff.
--ccpprr (Valid only on IRIX systems) Allows users to
checkpoint or restart MPI jobs that consist
of a single executable file running on a
single system. The absence of any host names
in the mmppiirruunn command indicates that a job is
running on a single system. For example, the
following command is valid:
mpirun -cpr -np 2 ./a.out >&1/dev/null
The following commands are not valid.
mpirun -cpr 2 ./a.out : 3 ./b.out
mpirun -cpr hosta -np 2 ./a.out>out 2>&1 </dev/null
The first one is not valid because it
consists of more than one executable file
(aa..oouutt and bb..oouutt). The second one is not
valid because even if submitted from hhoossttaa,
it specifies a host name.
For interactive users, the preferred method
of checkpointing the job is by AASSHH. This
ensures that all of the user's processes
specified in the mmppiirruunn command, plus daemons
associated with the job, will be
checkpointed. You can use the aarrrraayy(1)
command to find the AASSHH of a job.
Interactive users should also note that
ssttddiinn, ssttddoouutt, and ssttddeerrrr should not be
connected to the terminal when this option is
being used.
Use of this option requires Array Services
3.1 or later.
--dd[[iirr]] _p_a_t_h__n_a_m_e Specifies the working directory for all
hosts. In addition to normal path names, the
following special values are recognized:
.. Translates into the absolute path
name of the user's current working
directory on the local host. This
is the default.
~~ Specifies the use of the value of
$$HHOOMMEE as it is defined on each
machine. In general, this value can
be different on each machine.
--ff[[iillee]] _f_i_l_e__n_a_m_e Specifies a text file that contains mmppiirruunn
arguments.
--hh[[eellpp]] Displays a list of options supported by the
mmppiirruunn command.
--mmiisseerr (Valid only on IRIX systems) Allows MPI jobs
that run on a single system to be submitted
to mmiisseerr. The absence of any host names in
the mmppiirruunn command indicates that a job is
running on a single system, and thus can be
submitted to mmiisseerr. For example, the
following command is valid:
miser_submit -q _q_u_e_u_e -f _f_i_l_e mpirun -miser 2 ./a.out : 3 ./b.out
The following command is not valid, even if
submitted on hhoossttaa:
miser_submit -q _q_u_e_u_e -f _f_i_l_e mpirun -miser hosta 2 ./a.out
Use of this option requires Array Services
3.1 or later.
--pp[[rreeffiixx]] _p_r_e_f_i_x__s_t_r_i_n_g Specifies a string to prepend to each line of
output from ssttddeerrrr and ssttddoouutt for each MPI
process. Some strings have special meaning
and are translated as follows:
* %%gg translates into the global rank of the
process producing the output. (This is
equivalent to the rank of the process in
MMPPII__CCOOMMMM__WWOORRLLDD.)
* %%GG translates into the number of processes
in MMPPII__CCOOMMMM__WWOORRLLDD.
* %%hh translates into the rank of the host on
which the process is running, relative to
the mmppiirruunn command line.
* %%HH translates into the total number of
hosts in the job.
* %%ll translates into the rank of the process
relative to other processes running on the
same host.
* %%LL translates into the total number of
processes running on the host.
* %%@@ translates into the name of the host on
which the process is running.
Note: For UNICOS implementations: Strings that specify on a global
or local MPI process rank are not supported with the shared memory
version of MPI (--nntt) on UNICOS systems. This means that of the
predefined strings available, %%gg and %%ll are not supported with --nntt.
For examples of the use of these strings, first consider the following
code fragment:
main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
printf("Hello world\n");
MPI_Finalize();
}
Depending on how this code is run, the results of running the mmppiirruunn
command will be similar to those in the following examples:
%% mmppiirruunn --nnpp 22 aa..oouutt
HHeelllloo wwoorrlldd
HHeelllloo wwoorrlldd
%% bbmmppiirruunn --pprreeffiixx "">>"" --nnpp 22 aa..oouutt
>>HHeelllloo wwoorrlldd
>>HHeelllloo wwoorrlldd
%% mmppiirruunn --pprreeffiixx ""%%gg"" 22 aa..oouutt
00HHeelllloo wwoorrlldd
11HHeelllloo wwoorrlldd
%% mmppiirruunn --pprreeffiixx ""[[%%gg]] "" 22 aa..oouutt
[[00]] HHeelllloo wwoorrlldd
[[11]] HHeelllloo wwoorrlldd
%% mmppiirruunn --pprreeffiixx ""<<pprroocceessss %%gg oouutt ooff %%GG>> "" 44 aa..oouutt
<<pprroocceessss 11 oouutt ooff 44>> HHeelllloo wwoorrlldd
<<pprroocceessss 00 oouutt ooff 44>> HHeelllloo wwoorrlldd
<<pprroocceessss 33 oouutt ooff 44>> HHeelllloo wwoorrlldd
<<pprroocceessss 22 oouutt ooff 44>> HHeelllloo wwoorrlldd
%% mmppiirruunn --pprreeffiixx ""%%@@:: "" hhoossttaa,,hhoossttbb 11 aa..oouutt
hhoossttaa:: HHeelllloo wwoorrlldd
hhoossttbb:: HHeelllloo wwoorrlldd
%% mmppiirruunn --pprreeffiixx ""%%@@ ((%%ll oouutt ooff %%LL)) %%gg:: "" hhoossttaa 22,, hhoossttbb 33 aa..oouutt
hhoossttaa ((00 oouutt ooff 22)) 00:: HHeelllloo wwoorrlldd
hhoossttaa ((11 oouutt ooff 22)) 11:: HHeelllloo wwoorrlldd
hhoossttbb ((00 oouutt ooff 33)) 22:: HHeelllloo wwoorrlldd
hhoossttbb ((11 oouutt ooff 33)) 33:: HHeelllloo wwoorrlldd
hhoossttbb ((22 oouutt ooff 33)) 44:: HHeelllloo wwoorrlldd
%% mmppiirruunn --pprreeffiixx ""%%@@ ((%%hh oouutt ooff %%HH)):: "" hhoossttaa,,hhoossttbb,,hhoossttcc 22 aa..oouutt
hhoossttaa ((00 oouutt ooff 33)):: HHeelllloo wwoorrlldd
hhoossttbb ((11 oouutt ooff 33)):: HHeelllloo wwoorrlldd
hhoossttcc ((22 oouutt ooff 33)):: HHeelllloo wwoorrlldd
hhoossttaa ((00 oouutt ooff 33)):: HHeelllloo wwoorrlldd
hhoossttcc ((22 oouutt ooff 33)):: HHeelllloo wwoorrlldd
hhoossttbb ((11 oouutt ooff 33)):: HHeelllloo wwoorrlldd
--vv[[eerrbboossee]] Displays comments on what mmppiirruunn is doing
when launching the MPI application.
The _e_n_t_r_y operand describes a host on which to run a program, and the
local options for that host. You can list any number of entries on
the mmppiirruunn command line.
In the common case (Same Program Multiple Data (SPMD)), in which the
same program runs with identical arguments on each host, usually only
one entry needs to be specified.
Each entry has the following components:
* One or more host names (not needed if you run on the local host)
* Number of processes to start on each host
* Name of an executable program
* Arguments to the executable program (optional)
An entry has the following format:
_h_o_s_t__l_i_s_t _l_o_c_a_l__o_p_t_i_o_n_s _p_r_o_g_r_a_m _p_r_o_g_r_a_m__a_r_g_u_m_e_n_t_s
The _h_o_s_t__l_i_s_t operand is either a single host (machine name) or a
comma-separated list of hosts on which to run an MPI program.
The _l_o_c_a_l__o_p_t_i_o_n_s operand contains information that applies to a
specific host list. The following local options are supported:
OOppttiioonn DDeessccrriippttiioonn
--ff[[iillee]] _f_i_l_e__n_a_m_e Specifies a text file that contains
mmppiirruunn arguments (same as
_g_l_o_b_a_l__o_p_t_i_o_n_s.) For more details, see
the Using a File For mmppiirruunn Arguments
subsection on this man page.
--nnpp _n_p Specifies the number of processes on
which to run. (UNICOS/mk systems support
only this option.)
--nntt _n_t On UNICOS systems, specifies the number
of tasks on which to run in a
multitasking or shared memory
environment. On IRIX systems, this
option behaves the same as --nnpp.
The _p_r_o_g_r_a_m _p_r_o_g_r_a_m__a_r_g_u_m_e_n_t_s operand specifies the name of the
program that you are running and its accompanying options.
UUssiinngg aa FFiillee ffoorr mmppiirruunn AArrgguummeennttss
Because the full specification of a complex job can be lengthy, on
UNICOS or IRIX systems, you can enter mmppiirruunn arguments in a file and
use the --ff option to specify the file on the mmppiirruunn command line, as
in the following example:
mpirun -f _m_y__a_r_g_u_m_e_n_t_s
The arguments file is a text file that contains argument segments.
White space is ignored in the arguments file, so you can include
spaces and newline characters for readability. An arguments file can
also contain additional --ff options.
LLaauunncchhiinngg PPrrooggrraammss oonn tthhee LLooccaall HHoosstt
For testing and debugging, it is often useful to run an MPI program on
the local host only without distributing it to other systems. To run
the application locally, enter mmppiirruunn with the --nnpp or --nntt argument.
Your entry must include the number of processes to run and the name of
the MPI executable file.
The following command starts three instances of the application mmtteesstt,
which is passed an arguments list (arguments are optional):
mpirun -np 3 mtest 1000 "arg2"
You are not required to use a different host in each entry that you
specify on the mmppiirruunn command. You can launch a job that has two
executable files on the same host. On a UNICOS system, the following
example uses a combination of shared memory and TCP. On an IRIX
system, both executable files use shared memory.
mpirun host_a -np 6 a.out : host_a -nt 4 b.out
RRuunnnniinngg PPrrooggrraammss iinn SShhaarreedd MMeemmoorryy MMooddee
For running programs in MPI shared memory mode on a UNICOS or IRIX
single host, the format of the mmppiirruunn command is as follows:
mpirun -nt [_n_t] _p_r_o_g_n_a_m_e
The --nntt option specifies the number of tasks for shared memory MPI,
and can be used on UNICOS systems only if you have compiled and linked
your program to preserve the message passing assumption that data is
private to each task. This is done by using the --hhttaasskkpprriivvaattee $$LLIIBBCCMM
or --aattaasskkccoommmmoonn options. (For further information on compiling and
linking for shared memory MPI on UNICOS systems, see the MMPPII(1) man
page.) A single UNIX process is run with multiple tasks representing
MPI processes. The _p_r_o_g_n_a_m_e operand specifies the name of the program
that you are running and its accompanying options.
The --nntt option to mmppiirruunn is supported on IRIX systems for consistency
across platforms. However, since the default mode of execution on a
single IRIX system is to use shared memory, the option behaves the
same as if you specified the --nnpp option to mmppiirruunn. The following
example runs ten instances of aa..oouutt in shared memory mode on hhoosstt__aa:
mpirun -nt 10 a.out
UUssiinngg tthhee mmppiirruunn CCoommmmaanndd oonn UUNNIICCOOSS//mmkk SSyysstteemmss
The mmppiirruunn command has been provided for consistency of use among
IRIX, UNICOS, and UNICOS/mk systems. Use of this command is optional,
however, on UNICOS/mk systems. If your program was built for a
specific number of PEs, the number of PEs specified on the mmppiirruunn
command line must match the number that was built into the program. If
it does not, mmppiirruunn issues an error message.
The following example shows how to invoke the mmppiirruunn command on a
program that was built for four PEs:
mpirun -np 4 a.out
EExxeeccuuttiinngg UUNNIICCOOSS//mmkk PPrrooggrraammss DDiirreeccttllyy
Instead of using the mmppiirruunn command, you can choose to launch your MPI
programs on UNICOS/mk systems directly. If your UNICOS/mk program was
built for a specific number of PEs, you can execute it directly, as
follows:
./a.out
If your program was built as a malleable executable file (the number
of PEs was not fixed at build time, and the --XXmm option was used
instead), you can execute it with the mmpppprruunn command. The following
example runs a program on a partition with four PEs:
mpprun -n 4 a.out
LLaauunncchhiinngg aa DDiissttrriibbuutteedd PPrrooggrraamm
You can use mmppiirruunn to launch a UNICOS or IRIX program that consists of
any number of executable files and processes and distribute it to any
number of hosts. A host is usually a single Origin, CRAY J90, or
CRAY T3E system, or can be any accessible computer running Array
Services software. Array Services software runs on IRIX and UNICOS
systems and must be running to launch MPI programs. For available
nodes on systems running Array Services software, see the
//uussrr//lliibb//aarrrraayy//aarrrraayydd..ccoonnff file.
You can list multiple entries on the mmppiirruunn command line. Each entry
contains an MPI executable file and a combination of hosts and process
counts for running it. This gives you the ability to start different
executable files on the same or different hosts as part of the same
MPI application.
The following examples show various ways to launch an application that
consists of multiple MPI executable files on multiple hosts.
The following example runs ten instances of the aa..oouutt file on hhoosstt__aa:
mpirun host_a -np 10 a.out
When specifying multiple hosts, the --nnpp or --nntt option can be omitted
with the number of processes listed directly. On UNICOS systems, if
you omit the --nnpp or --nntt option, mmppiirruunn assumes --nnpp and defaults to TCP
for communication. The following example launches ten instances of
ffrreedd on three hosts. ffrreedd has two input arguments.
mpirun host_a, host_b, host_c 10 fred arg1 arg2
The following example launches an MPI application on different hosts
with different numbers of processes and executable files, using an
array called tteesstt:
mpirun -array test host_a 6 a.out : host_b 26 b.out
The following example launches an MPI application on different hosts
out of the same directory on both hosts:
mpirun -d /tmp/mydir host_a 6 a.out : host_b 26 b.out
JJoobb CCoonnttrrooll
It is possible to terminate, suspend, and/or resume an entire MPI
application (potentially running across multiple hosts) by using the
same control characters that work for serial programs. For example,
sending a SSIIGGIINNTT signal to mmppiirruunn terminates all processes in an MPI
job. Similarly, sending a SSIIGGTTSSTTPP signal to mmppiirruunn suspends an MPI job
and sending a SSIIGGCCOONNTT signal resumes a job.
TTrroouubblleesshhoooottiinngg
Problems you encounter when launching MPI jobs will typically result
in a ccoouulldd nnoott rruunn eexxeeccuuttaabbllee error message from mmppiirruunn. There are
many possible causes for this message, including (but not limited to)
the following reasons:
* The .. is missing from the user's search path. This problem most
commonly occurs when the --nnpp syntax is used.
* No permission has been granted to launch processes on remote
machine(s). You might need to set ~~//..rrhhoossttss appropriately.
* The working directory is defaulting to $$HHOOMMEE instead of to $$PPWWDD on
remote machines; use either MMPPII__DDIIRR or the --dd option.
* llooccaallhhoosstt does not appear in the //eettcc//hhoossttss..eeqquuiivv file (required for
--nnpp syntax).
* The Array Services daemon (aarrrraayydd) has been incorrectly configured;
use aasscchheecckk to test your configuration.
* In general, if aarrsshheellll fails, mmppiirruunn usually fails as well.
LLiimmiittaattiioonnss
The following practices will break the mmppiirruunn parser:
* Using machine names that are numbers (for example, 33, 112277, and so
on)
* Using MPI applications whose names match mmppiirruunn options (for
example, --dd, --ff, and so on)
* Using MPI applications that use a colon (::) in their command-lines.
NNOOTTEESS
Running MPI jobs in the background is not supported on IRIX and UNICOS
systems.
The mmppiirruunn process is still connected to the tty when a job is placed
in the background. One of the things that mmppiirruunn polls for is input
from ssttddiinn. If it happens to be polling for ssttddiinn when a user types
in a window after putting an MPI job in the background, the job will
abort upon receiving a SSIIGGTTTTIINN signal. This behavior is intermittent,
depending on whether mmppiirruunn happens to be looking for and sees any
ssttddiinn input. Currently, there is no solution to this restriction, but
for a job that does not use ssttddiinn, you can use the following
workaround to launch the job:
To run an MPI job in the background, use a shell script to launch the
MPI job, then exit the invoked shell. This will exit the parent shell
and leave the child processes (mmppiirruunn and subsequently, the MPI
processes) running. Since the parent is gone, there will be no process
group associated with mmppiirruunn, and it will be cut off from the tty.
Example:
% cat runscript
#!/bin/sh
mpirun -np 2 ./a.out > out &
exit
RREETTUURRNN VVAALLUUEESS
On exit, mmppiirruunn returns a status of zero unless it detected a problem,
in which case it returns a nonzero status (currently, all values are
1, but this might change in the future).
SSEEEE AALLSSOO
mmppii(1), mmpppprruunn(1), mmpptt__iinnttrroo(1)
tteerrmmiioo(7)
This man page is available only online.